Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF
نویسنده
چکیده
The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For the actual search system, we have used the classical tf.idf approach with blind relevance feedback as implemented in the Lemur toolkit. The results indicate that a suitable linguistic preprocessing is indeed crucial for the Czech IR performance.
منابع مشابه
Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006
The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For...
متن کاملUniversity of Chicago at the CLEF 2007 Cross Language Speech Retrieval Track
The University of Chicago participated in the CLEF 2007 CL-SR track, performing monolingual retrieval for both English and Czech and cross-language French-English retrieval. English experiments considered the impact of automatically generated keywords on retrieval. Czech experiments explored the effect of different stemming approaches on retrieval for this morphologically rich language. The bes...
متن کاملCharles University at CLEF 2007 CL-SR Track
This paper describes a system built at Charles University in Prague for participation in the CLEF 2007 Cross-Language Speech Retrieval track. We focused only on mono-lingual searching the Czech collection and used the LEMUR toolkit as the retrieval system. We employed own morphological tagger and lemmatized the collection before indexing to deal with the rich morphology in Czech which significa...
متن کاملExperiments for the Cross Language Speech Retrieval Task at CLEF 2006
This paper presents the second participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2006. We present the results of the submitted runs for the English collection and very briefly for the Czech collection, followed by many additional experiments. We have used two Information Retrieval systems in our experiments: SMART and Terrier, with sever...
متن کاملBrown at CL-SR'07: Retrieving Conversational Speech in English and Czech
Brown’s entry to the Cross-Language Speech Retrieval (CL-SR) track at the 2007 Cross Language Evaluation Forum (CLEF) was based on the language model (LM) paradigm for retrieval [17]. For English, our system introduced two minor enhancements to the basic unigram: we extended Dirichlet smoothing (popular with unigram modeling) to bigrams, and we smoothed the collection LM to compensate for the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015